This workshop will provide an introduction into R!
R is a popular programming language that many researchers use for organizing data, visualizing data, and carrying out statistical analyses.
By the end of the workshop series, my hope is that you will feel comfortable enough to work independently in R!
[Coding expectations versus reality]
Before the workshop:
Download a R CRAN Mirror, which basically just hosts the R programming language that we will be using in RStudio. https://cran.r-project.org/
Download RStudio, which is the main software that we will be using to work with R. https://posit.co/download/rstudio-desktop/
Download the ‘CABLAB R Workshop’ folder from the CABLAB R Workshop Github. This is the folder containing the all the files we will be working with for the purposes of this workshop.
To get things started, open R Studio. Then, let’s try opening a new R Markdown document, by clicking File > New File > R Markdown…
This should produce a dialogue box where you can enter the name of the script and your name before selecting OK.
Next, let’s clear out all of the default text that appears in a new R Markdown document, which I have highlighted below:
In a typical coding script, every line must contain code that the language could interpret. If you want to include notes, you have to include a hash mark (#_ before any code in order for the program to “ignore this line”. So, in order to leave ourselves any notes, we had to use hash marks, which can get a bit annoying. However, an R Markdown script does the same things as a typical coding script, but it’s more user friendly.
With R Markdown, any code that you would like R to interpret belongs in the coding chunk as illustrated below!
If we want to leave notes, we don’t have to “comment it out”. We can just write long-winded narration that can help others understand why we coded what we coded and what that code does.
Below are examples of R accepting notes when it’s commented out
and rejecting code notes when it’s not commented out!
That’s because a typical script will interpret any text as a command, unless the text is otherwise marked by a hashtag (#). An R markdown script only interprets things as code when we tell it to, and we tell it what is code by creating a chunk. Chunks are marked by three backticks (```) followed by a {r} and, on another line, three more backticks.
A typical script can’t make sense of this, though. We need to use R markdown scripts to do it. You might be thinking, though, that manually denoting code from non-code seems like extra work, and it is a little bit, but it can also be a lot more convenient because the output of any given chunk will appear in the R Studio Console Window. By output, we just mean the product, sum, or status of whatever calculation or item you are asking R to compute and show you.
R Markdown grants us greater control over what we see and when we see it. To demonstrate, let’s start by creating a new chunk in our markdown document and entering what we see in the image above, you can then follow along with the next bit:
2 + 2
## [1] 4
With a typical script, if we want to know the output of a line we ran awhile ago, we either have to rerun it or scroll through the console to find it. With Markdown we can minimize entire chunks and their output by using the minimization button [ Minimization Arrow ] on the left side of the window.
If we want to hide output, we can use the expand/collapse button [ Minimize Command ] on the right side of the output window.
We can choose exactly what we want to run using the the “Run” command [ Run Command ] in the upper right corner of the chunk.
Also of note, the down-facing arrow (second icon in the upper right corner of the code block) will tell R “Run all of the blocks of command that I have before this block” [ Run All Chunks Command ]. It can be helpful if you make a mistake and don’t want to manually rerun all of the previous blocks one by one to get back to where you were. It also makes your code very easy for other people to run. They can quite literally do it with the click of a button!
If we click the cog icon in the same tray, we can access the output options and manipulate where output appears and what it looks like, but that’s beyond the scope of this workshop [ Settings Command ].
Packages in R are synonymous with libraries in other languages. They are more or less convenient short-cuts or functions someone else already programmed to save us some work. Somebody else already figured out a very quick way to compute a regression so now we don’t have to! We just use their tools to do it.
Every new package is centralized in R’s repository, so even though thousands of people are working on these things independently, you don’t need to leave R to find them. Before they can be used, they must be installed, and you can do that pretty simply:
install.packages("PACKAGENAME")
If you need to update a package, you can just re-run the above code. If you’re using R Studio, you can also see a list of your packages and their associated descriptions in the ‘Packages’ Tab of your Viewer Window.
Now we’ve installed a package, but that doesn’t mean we can use it yet. We need to tell R “We want access to the functions this package has during this session by calling it with the library() command.
library(PACKAGENAME)
Notice that we drop the quotation marks now. We just specify the (case-sensitive) package name and it let’s R know we are planning on using that this session.
You might be wondering why we need to take this extra step. Sometimes different packages use the same commands, so having more than one of those active at the same time could confuse R (When this does happen, R will usually tell you). Sometimes packages take up a lot of disk space, so having ALL of your packages initialized at once might leave your computer running extremely slow. It’s the same for most languages.
If we ever want to explore the functions contained within a package in conjunction with examples, we can either go to the R documentation website or type ‘??PackageName’ into the Console, which will then populate the Help Tab of the Viewer Window with information on the package.
Let’s try installing and loading in a few package for practice. Let’s install and load the following packages in R: naniar, report, tidyverse, dplyr, and ggplot2
Swirl is a really cool package in R that teaches you R programming and data science interactively, at your own pace, and right in the R console! Complete the “R Programming: The basics of programming in R” modules 1 (Basic Building Blocks) - 7 (Matrices and Data Frame) in swirl. Some of it will make sense, and some of it won’t, but I think swirl does a pretty good job of orienting people to how basic operations in R work, and I think this is especially helpful before we start working with any actual data.
Let’s give this a try and we can talk through any problems people ran into during our next workshop. I’ve attached some screenshots below demonstrating how to install and load swirl().
A working directory is a fancy term that refers to the default location where R will look for files you want to load and where it will put any files you save. Like any other language or program, R needs to be told where the data that we’d like to work with is located on our computer. It doesn’t just know automatically. Here’s how to check out where where your current working directory is using the getwd() command.
Using the list.files() command will show you what files exist in your current working directory.
getwd() #get your current working directory
## [1] "/Users/tuh20985/Desktop/CABLAB-R-Workshop-Series-main"
list.files() #Use list.files() to check the contents of your working directory
## [1] "CABLAB_R_online.Rmd" "CABLAB_R.Rmd" "datasets"
## [4] "exercise_solutions" "images" "index.html"
## [7] "misc" "R memes" "README.md"
In order to work with the data that we want to work with, we’ll have to tell R where the files are located, so we can create a new variable containing a filepath to make this process simple so we aren’t writing it out multiple times. I’m using a Windows computer, so nearly everything is contained within my C:/ Drive. If I were on a Mac, I’d start with a forward slash. If I wasn’t sure of my path, R makes it relatively easy to find it.
If I start by entering this: Path <- “/
I can press tab when my cursor is to the left of the slash to see a list of directories contained within my C:/ Drive.
# For Windows
Path <- "C:/"
# For Mac
Path <- "/"
Here’s an example of what you should see:
Pressing tab again will enter into a directory, thus showing me the contents of that directory. From there, I can keep hitting tab until I get to the directory, or folder, that contains the files I want to work with. I can then save this filepath, which is just what we call a string (i.e., text containing not quantiative value), as an object named Path. We do so by placing the object on the left of an equal sign (=) or an arrow (<-) and the value that object is taking on the right side of it.
# For Windows
Path <- "C:/Users/tuh20985/Desktop/CABLAB-R-Workshop-Series-main/datasets/"
# For Mac
Path <- "/Users/tuh20985/Desktop/CABLAB-R-Workshop-Series-main/datasets/"
This format of assigning a value to an object is really important and we’ll keep coming back to it throughout this tutorial.
For the purposes of this project, we are going to work with the fright night dataset! The Fright Night project took place in 2021 at the Eastern State Penententiary annual “Halloween Nights” haunted house event in Philadelphia. n=116 people completed a haunted house tour as part of a research study assessing the relationship between threat and memory. Specifically, we explored 2 main research questions: 1) How does naturalistic threat affect memory accuracy?; and 2) Does naturalistic threat affect the way in which we communicate our memories?
Participants toured four haunted house segments (Delirium, Take 13, Machine Shop, and Crypt) that included low-threat and high-threat segments. Delirium and Take 13 were low-threat segments, whereas Machine Shop and Crypt were high-threat segments.
There were also 3 experimental conditions: Control, Share, and Test.
Control condition: Participants were instructed to tour the haunted house segment as they normally would.
Share condition: Participants were instructed to tour the haunted house segment in anticipation of an opportunity to post about their experience on social media afterwards.
Test condition: Participants were instructed to tour the haunted house segment in anticipation of being tested on their knowledge of the haunted house segment afterwards.
For the first two segments (Delirium and Take 13), all participants toured the segment in the Control condition. However, in the last two segments (Crypt and Machine Shop), some participants toured the segments in the Control condition, other participants toured Machine Shop in the Share condition and Crypt in the Test condition, while other participants toured Machine Shop in the Test condition and Crypt in the Share condition.
After completing the haunted house tour, participants were assessed at two time points: immediately afterwards and again 1-week later. During the Immediate assessments, participants completed a recency discrimination task and freely recalled their memory for 1 low-threat and 1-high threat haunted house segment. During the one week-delay assessments, participants completed a recency discrimination task and freely recalled their memory for all haunted house segments. Check out the study design below as well as the vignette illustrating when the three experimental conditions (i.e., Control, Share, and Test) took place throughout the haunted house tour.